Vector Processing-Aware Advanced Clock-Gating Techniques for Low-Power Fused Multiply-Add
نویسندگان
چکیده
منابع مشابه
A Review of Low Power Consumption Clock Gating Techniques
This paper represents a review of some existing clock gating techniques for low power dissipation in digital circuitry designs. In this paper, the clock gating techniques are used which reduces the power consumption from the normal implementation of the same design. The 16 bit ALU (arithmetic logical unit) is used for reducing the dynamic power consumption through gating techniques by shutting ...
متن کاملMixed-precision Fused Multiply and Add
The standard floating-point fused multiply and add (FMA) computes R=AB+C with a single rounding. This article investigates a variant of this operator where the addend C and the result R are of a larger format, for instance binary64 (double precision), while the multiplier inputs A and B are of a smaller format, for instance binary32 (single precision). With minor modifications, this operator is...
متن کاملArgument Reduction with a Fused Multiply - Add
The Cody and Waite argument reduction technique works perfectly for reasonably large arguments, but as the input grows, there are no bits left to approximate the constant with enough accuracy. Under mild assumptions, we show that the result computed with a fused multiply-add provides a fully accurate result for many possible values of the input with a constant almost accurate to the full workin...
متن کاملSuggestions for Implementing a Fast Ieee Multiply-add-fused Instruction
We studied three possible strategies to overlap the operations in a floating-point add (FADD) and a floating-point multiply (FMPY) for implementing a multiply-add-fused (MAF) instruction, whose result would be compatible with the IEEE floating-point standard. The operations in FMPY and FADD are: (a) non-overlapped, (b) fully-overlapped, and (c) partially-overlapped. The first strategy correspon...
متن کاملHigh-Level Synthesis for Minimum-Area Low-Power Clock Gating
Clock gating is one of useful techniques to reduce the dynamic power consumption of synchronous sequential circuits. To reduce the power consumption of clock tree, previous work has shown that clock control logic should be synthesized in the high-level synthesis stage. However, previous work may suffer from a large circuit area overhead on the clock control logic. In this paper, we present an I...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Very Large Scale Integration (VLSI) Systems
سال: 2018
ISSN: 1063-8210,1557-9999
DOI: 10.1109/tvlsi.2017.2784807